150 research outputs found

    Bayesian reordering model with feature selection

    No full text
    In phrase-based statistical machine translation systems, variation in grammatical structures between source and target languages can cause large movements of phrases. Modeling such movements is crucial in achieving translations of long sentences that appear natural in the target language. We explore generative learning approach to phrase reordering in Arabic to English. Formulating the reordering problem as a classification problem and using naive Bayes with feature selection, we achieve an improvement in the BLEU score over a lexicalized reordering model. The proposed model is compact, fast and scalable to a large corpus

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Mining protein database using machine learning techniques

    No full text
    With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins

    Machine Learning for Intrusion Detection: Modeling the Distribution Shift

    No full text
    This paper addresses two important issue that arise in formulating and solving computer intrusion detection as a machine learning problem, a topic that has attracted considerable attention in recent years including a community wide competition using a common data set known as the KDD Cup ’99. The first of these problems we address is the size of the data set, 5 × 106 by 41 features, which makes conventional learning algorithms impractical. In previous work, we introduced a one-pass non-parametric classification technique called Voted Spheres, which carves up the input space into a series of overlapping hyperspheres. Training data seen within each hypersphere is used in a voting scheme during testing on unseen data. Secondly, we address the problem of distribution shift whereby the training and test data may be drawn from slightly different probability densities, while the conditional densities of class membership for a given datum remains the same. We adopt two recent techniques from the literature, density weighting and kernel mean matching, to enhance the Voted Spheres technique to deal with such distribution disparities. We demonstrate that substantial performance gains can be achieved using these techniques on the KDD cup data set

    Unsupervised Texture Segmentation using Active Contours and Local Distributions of Gaussian Markov Random Field Parameters

    No full text
    In this paper, local distributions of low order Gaussian Markov Random Field (GMRF) model parameters are proposed as texture features for unsupervised texture segmentation.Instead of using model parameters as texture features, we exploit the variations in parameter estimates found by model fitting in local region around the given pixel. Thespatially localized estimation process is carried out by maximum likelihood method employing a moderately small estimation window which leads to modeling of partial texturecharacteristics belonging to the local region. Hence significant fluctuations occur in the estimates which can be related to texture pattern complexity. The variations occurred in estimates are quantified by normalized local histograms. Selection of an accurate window size for histogram calculation is crucial and is achieved by a technique based on the entropy of textures. These texture features expand the possibility of using relativelylow order GMRF model parameters for segmenting fine to very large texture patterns and offer lower computational cost. Small estimation windows result in better boundarylocalization. Unsupervised segmentation is performed by integrated active contours, combining the region and boundary information. Experimental results on statistical and structural component textures show improved discriminative ability of the features compared to some recent algorithms in the literature

    Approximate low-rank factorization with structured factors

    No full text
    An approximate rank revealing factorization problem with structure constraints on the normalized factors is considered. Examples of structure, motivated by an application in microarray data analysis, are sparsity, nonnegativity, periodicity, and smoothness. In general, the approximate rank revealing factorization problem is nonconvex. An alternating projections algorithm is developed, which is globally convergent to a locally optimal solution. Although the algorithm is developed for a specific application in microarray data analysis, the approach is applicable to other types of structure

    An Inhomogeneous Bayesian Texture Model for Spatially Varying Parameter Estimation

    No full text
    In statistical model based texture feature extraction, features based on spatially varying parameters achievehigher discriminative performances compared to spatially constant parameters. In this paper we formulate anovel Bayesian framework which achieves texture characterization by spatially varying parameters based onGaussian Markov random fields. The parameter estimation is carried out by Metropolis-Hastings algorithm.The distributions of estimated spatially varying parameters are then used as successful discriminant texturefeatures in classification and segmentation. Results show that novel features outperform traditional GaussianMarkov random field texture features which use spatially constant parameters. These features capture bothpixel spatial dependencies and structural properties of a texture giving improved texture features for effectivetexture classification and segmentation
    corecore